This document highlights some insights garnered from data obtained through SIM DAC’s marketing efforts. The results reflected are from the collective efforts and dedication of SIM DAC’s members - both committee and non-committee.
This section analyses the club’s Facebook page.
Before we begin looking at the data, some context should be provided. The Facebook page have been a channel for marketing the club’s events and also a way to communicate with people interested in knowing more about the club.
To begin with, we have to find out when the page was created. However, from the ‘About’ section of the page, what is stated there is the date which DAC was founded on and not the date of the page’s creation.
After doing a simple Google search, it turns out that a page’s creation date cannot be obtained from the dashboard. So, another approach has to be adopted in getting a ‘starting point’ for analysing the page. To get a sense of where to begin looking for the starting point, I took a look at the page’s only profile picture and looked at when it was added.
As seen from above, the date which the profile picture was uploaded is 2nd of February, 2015 (Monday). We can start looking for signs of initial activity from 2015 Q1.
As an ‘Admin’ of the page, I am able to export .csv files from the page’s ‘Insights’ section.
Several .csv files for the quarters of 2015 and 2016 were exported - both page and post level data. The page level data set presents insights on the page’s general activity and performance, while the post level data set goes deeper into the posts’ performance. The files can be downloaded through a Dropbox link here.
Examining the .csv file through Excel, I observed that several rows from the top contains just null values. However, from the row dated 26th of January, 2015 (Monday) onwards, there are values recorded. Hence, it is with good reasoning that we adopt the assumption that the club’s Facebook page was created on that day.
Before we jump into analysing the data sets, data cleaning (mostly in terms of reconciling the number of variables) has to be done. Further elaboration is done over here. I exported the two data tables obtained from there into .csv files (downloadble from the aforementioned Dropbox link). So, if you were to follow along but skipped the part where data cleaning was done, you can just import those .csv files into your environment.
library(data.table)
# For Page Level data
dacFBpage <- fread(input = "SIM DAC Facebook Insights Data Export 2016-12-24 (Page Level - Cleaned).csv")
# For Post Level data
dacFBpost <- fread(input = "SIM DAC Facebook Insights Data Export 2016-12-24 (Post Level - Cleaned).csv")
Now that we have imported those data sets, it is time for analysis - level by level.
dacFBpageWe first check the structure of the page level data set.
str(dacFBpage, list.len = 5)
## Classes 'data.table' and 'data.frame': 726 obs. of 450 variables:
## $ Date : chr "2014-12-31" "2015-01-01" "2015-01-02" "2015-01-03" ...
## $ Lifetime Total likes : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Daily New likes : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Daily Unlikes : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Daily Page engaged users : int 0 0 0 0 0 0 0 0 0 0 ...
## [list output truncated]
## - attr(*, ".internal.selfref")=<externalptr>
The second parameter set for the line above is to truncate the would-have-been long output (the data table has 450 variables). It is shown below that all of dacFBpage‘s variables are of the ’integer’ class except for the first one.
# Storing into a data table the class of variables into a data table
pageVarsClass <- as.data.table(sapply(dacFBpage, class))
# Checking the distribution of classes
table(pageVarsClass$V1)
##
## character integer
## 1 449
# Which column is currently regarded as a 'character' variable
which(pageVarsClass == "character")
## [1] 1
Checking the character length of each values under the Date column.
table(nchar(dacFBpage$Date))
##
## 10
## 726
Since all values under the Date column are of uniform length, we can now convert the variable from ‘character’ into a ‘Date’ class. The values are formatted as “YYYY-MM-DD” (e.g. 2014-12-31).
class(dacFBpage$Date)
## [1] "character"
dacFBpage$Date <- as.Date(x = dacFBpage$Date, format = "%Y-%m-%d")
class(dacFBpage$Date)
## [1] "Date"
We have decided the starting point for analysis to be 26th of January, 2015 so we subset the data set to be from then onwards.
startPoint <- as.Date("2015-01-26", format = "%Y-%m-%d")
dacFBpage <- dacFBpage[Date >= startPoint]
Now, we will be checking for elements in the data table that contains either an empty string (“”), a white space (" “) or NAs.
table(dacFBpage[,2:ncol(dacFBpage)] == " ")
##
## FALSE
## 270808
table(dacFBpage[,2:ncol(dacFBpage)] == "")
##
## FALSE
## 270808
table(is.na(dacFBpage[,2:ncol(dacFBpage)]))
##
## FALSE TRUE
## 270808 43492
For these columns, what we know is that if it is of an ‘NA’ value, then it can be imputed with the value ‘0’. It is safe to assume that these not due to non-availability of information and ’0’s are measurable values.
dacFBpage[is.na(dacFBpage)] <- 0
table(is.na(dacFBpage))
##
## FALSE
## 315000
Now, we can finally do the analysis.
We will start off by plotting the movement of the page’s number of likes throughout it’s existence.
library(plotly)
pageLikesPlotly <- plot_ly(data = dacFBpage,
x = dacFBpage$Date,
y = dacFBpage$`Lifetime Total likes`,
type = 'scatter',
mode = 'lines')
pageLikesPlotly
More plots/analyses to come…
dacFBpostThe Post Level data touches on deeper insights for each published posts on the page’s timeline. From the extent of reach to clicks, we will be looking into the different insights we can garner from the post level data here.
Structure of the post level data…
str(dacFBpost, list.len = 7)
## Classes 'data.table' and 'data.frame': 180 obs. of 34 variables:
## $ Post ID : chr "332464930295436_357536961121566" "332464930295436_354518904756705" "332464930295436_354518911423371" "332464930295436_354275911447671" ...
## $ Permalink : chr "https://www.facebook.com/dacsim/posts/357536961121566:0" "https://www.facebook.com/dacsim/posts/354518904756705:0" "https://www.facebook.com/dacsim/posts/354518911423371:0" "https://www.facebook.com/dacsim/posts/354275911447671" ...
## $ Post Message : chr "Ever wondered what happens when you merge ANOVA and Regression? \n\nDAC presents to you Analysis of Covariance (ANCOVA) with R "| __truncated__ "Analysis of Variance (ANOVA) with R Programming!" "Analysis of Variance (ANOVA) with R Programming!" "High demand, low supply. Are we going on the right direction?" ...
## $ Type : chr "Photo" "Photo" "Photo" "Link" ...
## $ Countries : logi NA NA NA NA NA NA ...
## $ Languages : logi NA NA NA NA NA NA ...
## $ Posted : chr "03/26/2015 05:00:02 AM" "03/19/2015 05:01:35 PM" "03/19/2015 05:01:35 PM" "03/19/2015 05:01:39 AM" ...
## [list output truncated]
## - attr(*, ".internal.selfref")=<externalptr>
Here we can see that post level data has more flavour to it with the additional presence of a potential categorical (Type) and text variable (Post Message). However, we can also see some variables categorised as logical variables but it does not seem to make sense on the surface. We will investigate further as we go along.
# Storing into a data table the class of variables into a data table
postVarsClass <- as.data.table(sapply(dacFBpost, class))
# Checking the distribution of classes
table(postVarsClass$V1)
##
## character integer logical
## 5 26 3
To investigate which variables are associated with each class of variables, we also need to create a vector of the variable names so we can make easily make the association.
postVarsNames <- names(dacFBpost)
# Which columns are currently regarded as a 'character' variable
postVarsChar <- which(postVarsClass == "character")
# Identifying the names of those columns
postVarsNames[postVarsChar]
## [1] "Post ID" "Permalink" "Post Message" "Type"
## [5] "Posted"
From the output returned to use, the variables that should not be retained as ‘character’ variables are Type (to categorical) and Posted (to date).
# Which columns are currently regarded as a 'logical' variable
postVarsLogic <- which(postVarsClass == "logical")
# Identifying the names of those columns
postVarsNames[postVarsLogic]
## [1] "Countries" "Languages" "Audience targeting"
Judging from the output, it would make more sense they are regarded as categorical variables but why could they possible be regarded as logical variables?
table(is.na(dacFBpost$Countries))
##
## TRUE
## 180
table(is.na(dacFBpost$Languages))
##
## TRUE
## 180
table(is.na(dacFBpost$`Audience targeting`))
##
## TRUE
## 180
Oh. That’s why. Due to these variables having only NA values, they could be regarded by the parser as being NA or otherwise i.e FALSE or TRUE. However, since they only contain NA values, they are redundant.
ncol(dacFBpost)
## [1] 34
dacFBpost <- dacFBpost[ , -(postVarsLogic), with = FALSE]
ncol(dacFBpost)
## [1] 31
Now, on to converting Posted variable into a proper date and time format. We shall first check that the character length of each value under the Posted column are uniform.
table(nchar(dacFBpost$Posted))
##
## 22
## 180
They are. Now we use the IDateTime function for conversion. The values are formatted as “MM-DD-YYYY HH:MM:SS AM/PM” (e.g. 03/26/2015 05:00:02 AM).
# Currently, variable contains character strings
class(dacFBpost$Posted)
## [1] "character"
# Create a vector of strings converted into POSIXlt/POSIXt
# We cannot simply overwrite into the same column as data tables do not allow POSIXlt formatted elements
postDateTime <- strptime(x = dacFBpost$Posted, format = "%m/%d/%Y %I:%M:%S %p")
# Creating a data table containing formatted dates and times of the posts
postDateTimeDTab <- IDateTime(postDateTime)
(postDateTimeDTab)
## idate itime
## 1: 2015-03-26 05:00:02
## 2: 2015-03-19 17:01:35
## 3: 2015-03-19 17:01:35
## 4: 2015-03-19 05:01:39
## 5: 2015-03-14 05:02:35
## ---
## 176: 2016-10-20 02:00:00
## 177: 2016-10-19 03:00:00
## 178: 2016-10-18 04:53:22
## 179: 2016-10-06 22:08:57
## 180: 2016-10-05 21:00:00
# Combine the data tables
dacFBpost <- cbind(dacFBpost,postDateTimeDTab)
# Rename the newly added variables appropriately
names(dacFBpost)[names(dacFBpost) %in% c("idate", "itime")] = c("Date", "Time")
# Get rid of the 'Posted' variable
dacFBpost$Posted <- NULL
The Type variable should be a categorical variable.
# Current class for variable
class(dacFBpost$Type)
## [1] "character"
# Convert into categorical variable
dacFBpost$Type <- as.factor(dacFBpost$Type)
class(dacFBpost$Type)
## [1] "factor"
# Here we see that the first level is unnamed
levels(dacFBpost$Type)
## [1] "" "Link" "Photo" "Status" "Video"
# We give it a reasonable name
levels(dacFBpost$Type)[1] <- "Others"
levels(dacFBpost$Type)
## [1] "Others" "Link" "Photo" "Status" "Video"
Like with the page level data, the variables of integer class can be given ’0’s in place of their NA values.
table(is.na(dacFBpost))
##
## FALSE TRUE
## 4590 1170
# Identify which columns are of integer class
# Storing into a data table the class of variables into a data table (updated)
postVarsClass <- as.data.table(sapply(dacFBpost, class))
postVarsClass <- postVarsClass[1,]
# Which columns are currently regarded as integer variables
postVarsInt <- which(postVarsClass == "integer")
# Names of variables of integer class
names(postVarsClass)[postVarsInt]
## [1] "Lifetime Post Total Reach"
## [2] "Lifetime Post organic reach"
## [3] "Lifetime Post Total Impressions"
## [4] "Lifetime Post Organic Impressions"
## [5] "Lifetime Engaged users"
## [6] "Lifetime Post consumers"
## [7] "Lifetime Post consumptions"
## [8] "Lifetime Negative feedback"
## [9] "Lifetime Negative feedback from users"
## [10] "Lifetime Post Impressions by people who have liked your Page"
## [11] "Lifetime Post reach by people who like your Page"
## [12] "Lifetime People who have liked your Page and engaged with your post"
## [13] "Lifetime Average time video viewed"
## [14] "Lifetime Organic watches at 95%"
## [15] "Lifetime Video length"
## [16] "Lifetime Organic Video Views"
## [17] "Lifetime Talking About This (Post) by action type - comment"
## [18] "Lifetime Talking About This (Post) by action type - like"
## [19] "Lifetime Talking About This (Post) by action type - share"
## [20] "Lifetime Post stories by action type - comment"
## [21] "Lifetime Post stories by action type - like"
## [22] "Lifetime Post stories by action type - share"
## [23] "Lifetime Post consumers by type - link clicks"
## [24] "Lifetime Post consumers by type - other clicks"
## [25] "Lifetime Post Consumptions by type - link clicks"
## [26] "Lifetime Post Consumptions by type - other clicks"
# Imputing the '0's
dacFBpost[is.na(dacFBpost)] <- 0
table(is.na(dacFBpost))
##
## FALSE
## 5760
For our first analysis, we will plot the average post count against the different quarters in 2015 and 2016.
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
dacFBpost$YearQuarter <- as.Date(as.yearqtr(dacFBpost$Date))
postGroupedQuarter <- dacFBpost[ , .N, by=dacFBpost$YearQuarter]
postCountPlot <- plot_ly(data = postGroupedQuarter,
x = postGroupedQuarter$dacFBpost,
y = postGroupedQuarter$N,
type = 'scatter',
mode = 'lines')
postCountPlot